ProtoNet: hierarchical classification of the protein space
نویسندگان
چکیده
The ProtoNet site provides an automatic hierarchical clustering of the SWISS-PROT protein database. The clustering is based on an all-against-all BLAST similarity search. The similarities' E-score is used to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. ProtoNet (version 1.3) is accessible in the form of an interactive web site at http://www.protonet.cs.huji.ac.il. ProtoNet provides navigation tools for monitoring the clustering process with a vertical and horizontal view. Each cluster at any level of the hierarchy is assigned with a statistical index, indicating the level of purity based on biological keywords such as those provided by SWISS-PROT and InterPro. ProtoNet can be used for function prediction, for defining superfamilies and subfamilies and for large-scale protein annotation purposes.
منابع مشابه
ProtoNet : Navigating the Hierarchical Clustering of the Protein Space
The ProtoNet site provides an automatic hierarchical clustering of the protein space. The clustering is based on an all-against-all BLAST similarity test. With this similarity measure we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy...
متن کاملProtoNet 4.0: A hierarchical classification of one million protein sequences
ProtoNet is an automatic hierarchical classification of the protein sequence space. In 2004, the ProtoNet (version 4.0) presents the analysis of over one million proteins merged from SwissProt and TrEMBL databases. In addition to rich visualization and analysis tools to navigate the clustering hierarchy, we incorporated several improvements that allow a simplified view of the scaffold of the pr...
متن کاملPredicting fold novelty based on ProtoNet hierarchical classification
MOTIVATION Structural genomics projects aim to solve a large number of protein structures with the ultimate objective of representing the entire protein space. The computational challenge is to identify and prioritize a small set of proteins with new, currently unknown, superfamilies or folds. RESULTS We develop a method that assigns each protein a likelihood of it belonging to a new, yet und...
متن کاملProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree
ProtoNet 6.0 (http://www.protonet.cs.huji.ac.il) is a data structure of protein families that cover the protein sequence space. These families are generated through an unsupervised bottom-up clustering algorithm. This algorithm organizes large sets of proteins in a hierarchical tree that yields high-quality protein families. The 2012 ProtoNet (Version 6.0) tree includes over 9 million proteins ...
متن کاملEntropy-driven partitioning of the hierarchical protein space
MOTIVATION Modern protein sequencing techniques have led to the determination of >50 million protein sequences. ProtoNet is a clustering system that provides a continuous hierarchical agglomerative clustering tree for all proteins. While ProtoNet performs unsupervised classification of all included proteins, finding an optimal level of granularity for the purpose of focusing on protein function...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nucleic acids research
دوره 31 1 شماره
صفحات -
تاریخ انتشار 2003